130 research outputs found

    Generating Multi-Categorical Samples with Generative Adversarial Networks

    Get PDF
    We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models

    Human in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms

    Get PDF
    We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse engineering the model generating the data despite noisy, incomplete, or imperfectly sampled data sources rather than optimizing a purely numeric target function. Domain expertise and human knowledge about the target domain can guide this process, and typically is captured in parameter settings. Often, domain expertise is subconscious and not expressed explicitly. Directly interacting with the learning algorithm makes it easier to utilize this knowledge effectively.Comment: 4 pages, presented at the Human in the Loop workshop at ICML 201

    Improving Missing Data Imputation with Deep Generative Models

    Full text link
    Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative models. Previous experiments with Generative Adversarial Networks and Variational Autoencoders showed interesting results in this domain, but it is not clear which method is preferable for different use cases. The goal of this work is twofold: we present a comparison between missing data imputation solutions based on deep generative models, and we propose improvements over those methodologies. We run our experiments using known real life datasets with different characteristics, removing values at random and reconstructing them with several imputation techniques. Our results show that the presence or absence of categorical variables can alter the selection of the best model, and that some models are more stable than others after similar runs with different random number generator seeds

    Minority Class Oversampling for Tabular Data with Deep Generative Models

    Full text link
    In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A common method to treat imbalanced datasets is under- and oversampling. In this process, samples are either removed from the majority class or synthetic samples are added to the minority class. In this paper, we follow up on recent developments in deep learning. We take proposals of deep generative models, including our own, and study the ability of these approaches to provide realistic samples that improve performance on imbalanced classification tasks via oversampling. Across 160K+ experiments, we show that all of the new methods tend to perform better than simple baseline methods such as SMOTE, but require different under- and oversampling ratios to do so. Our experiments show that the way the method of sampling does not affect quality, but runtime varies widely. We also observe that the improvements in terms of performance metric, while shown to be significant when ranking the methods, often are minor in absolute terms, especially compared to the required effort. Furthermore, we notice that a large part of the improvement is due to undersampling, not oversampling. We make our code and testing framework available

    „Gaudeamus igitur?“ – Prevalence and factors of influence on alcohol consumption by students

    Get PDF
    Ziel der vorliegenden Untersuchung war es, die Konsumgewohnheiten alkoholischer GetrĂ€nke in einer bislang wenig untersuchten Subpopulation, den Studierenden, zu untersuchen. Zu diesem Zweck wurden PrĂ€valenzen bezĂŒglich der Konsummenge, der Konsummuster, des Rauschtrinkens (Binge-Drinking) sowie alkoholbezogener Störungen, wie Missbrauch oder AbhĂ€ngigkeit, ermittelt. Des Weiteren wurden spezifische mögliche Einflussfaktoren auf den Alkoholkonsum von Studierenden untersucht. Hierzu wurden demographische Faktoren, die Strukturen des Studiums, insbesondere auch vor dem Hintergrund der Umstellung der StudienabschlĂŒsse gemĂ€ĂŸ den Bologna-BeschlĂŒsse, sowie allgemeine psychische Belastungen nĂ€her betrachtet. Die Erhebung der Daten erfolgte im Jahr 2008 anhand einer Online-Befragung an 2.348 Studierenden dreier niedersĂ€chsischer Hochschulen, der TU Braunschweig, der HBK Braunschweig und der Fachhochschule Braunschweig/WolfenbĂŒttel. Im Anschluss hieran wurde zur GĂŒteabschĂ€tzung des Online-Fragebogens und zur spezifischeren Betrachtung pathologischer Trinkmuster und psychischer AuffĂ€lligkeiten ein strukturiertes klinisches Interview mit 72 Teilnehmern des Online-Fragebogens durchgefĂŒhrt. Die Ergebnisse zeigen einen deutlich erhöhten Alkoholkonsum: Nur 10,2% der Studierenden gaben bezogen auf die letzten 30 Tage an, abstinent zu sein, 70,8% tranken risikoarm. 19% konsumierten Alkohol in einem mindestens riskanten Maße. Dieses Ergebnis spiegelt sich auch im Rauschtrinken wider: 34,9% der Studierenden waren Binge-Drinker, weitere 14,5% sog. Heavy User, die sich fĂŒnf oder mehr Tage pro Monat rauschmĂ€ĂŸig betranken. Nach Auswertung eines Alkoholismus-Screening-Verfahrens (CAGE) betrieben 30,3% mindestens missbrĂ€uchlichen, wenn nicht abhĂ€ngigen Konsum. Die Ergebnisse decken deutlich andere Konsummuster als in der Normalbevölkerung auch gleichen Alters auf. Als GrĂŒnde scheinen jedoch weniger internale Bedingungen, wie Coping bei DepressivitĂ€t oder Ängstlichkeit, sondern vielmehr sozialmotivierte Faktoren eine Rolle zu spielen, die sich zum Teil sogar positiv auf das psychische Wohlbefinden auswirken. Weitere ErklĂ€rungsmuster werden abschließend diskutiert.The aim of this paper is to examine the pattern of alcohol use among a sub-group, namely students, which up to this point has been only examined insufficiently. To achieve this, the prevalence of the amount of alcohol consumed, the way in which it is consumed, ‘Binge-drinking’, as well as illnesses resulting from alcohol abuse were determined. In addition, specific possible factors which influence the consumption of alcohol among students were examined. For this, demographic factors, the structure of the course studied, especially against the background of the changes in the university qualifications according to the Bologna agreement, as well as general mental pressures were examined closer. The data was collected in 2008 through an online survey of 2,348 students at three universities in Lower Saxony, namely the Technical University of Braunschweig, the University of Arts in Braunschweig and the University of Applied Sciences Braunschweig/WolfenbĂŒttel. Finally, an evaluation of the survey was made and an interview with 72 participants was carried out in order to establish specific pathological drinking patterns and mental disturbances. The results show a significant level of alcohol consumption. Over a period of 30 days before the survey, only 10.2% of the students involved claimed not to have drunk any alcohol , 70.8% drank without risk and 19% consumed an amount of alcohol which can be considered detrimental to health. This high level of alcohol consumption is also evident in the high frequency of ‘Binge-drinking’: 34.9% of the participants in the survey were binge-drinkers, a further 14.5% were so called ‘Heavy-users’, who consume a very large amount of alcohol(binge-drinking) on more than five days a month. According to an alcoholism screening (CAGE), 30.3% are alcohol abusers or even alcohol dependent. The results show a deviance in alcohol consumption among students as compared with their peer group in the population and with the population generally. However, an explanation of this high consumption of alcohol seems to depend less on internal conditions, for example coping with depression or anxiety, but much more on socially motivated factors, where alcohol consumption in part has a positive effect on psychological balance. Further explanations for this phenomenon will be discussed

    Acoustic structure of male loud-calls support molecular phylogeny of Sumatran and Javanese leaf monkeys (genus Presbytis)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The degree to which loud-calls in nonhuman primates can be used as a reliable taxonomic tool is the subject of ongoing debate. A recent study on crested gibbons showed that these species can be well distinguished by their songs; even at the population level the authors found reliable differences. Although there are some further studies on geographic and phylogenetic differences in loud-calls of nonhuman primate species, it is unclear to what extent loud-calls of other species have a similar close relation between acoustic structure, phylogenetic relatedness and geographic distance. We therefore conducted a field survey in 19 locations on Sumatra, Java and the Mentawai islands to record male loud-calls of wild surilis (<it>Presbytis</it>), a genus of Asian leaf monkeys (Colobinae) with disputed taxanomy, and compared the structure of their loud-calls with a molecular genetic analysis.</p> <p>Results</p> <p>The acoustic analysis of 100 surili male loud-calls from 68 wild animals confirms the differentiation of <it>P.potenziani, P.comata, P.thomasi </it>and <it>P.melalophos</it>. In a more detailed acoustic analysis of subspecies of <it>P.melalophos</it>, a further separation of the southern <it>P.m.mitrata </it>confirms the proposed paraphyly of this group. In concordance with their geographic distribution we found the highest correlation between call structure and genetic similarity, and lesser significant correlations between call structure and geographic distance, and genetic similarity and geographic distance.</p> <p>Conclusions</p> <p>In this study we show, that as in crested gibbons, the acoustic structure of surili loud-calls is a reliable tool to distinguish between species and to verify phylogenetic relatedness and migration backgrounds of respective taxa. Since vocal production in other nonhuman primates show similar constraints, it is likely that an acoustic analysis of call structure can help to clarify taxonomic and phylogenetic relationships.</p

    Generating Multi-Categorical Samples with Generative Adversarial Networks

    Get PDF
    We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models
